A strategy to quantify myofibroblast activation on a continuous spectrum

Myofibroblasts are a highly secretory and contractile cell phenotype that are predominant in wound healing and fibrotic disease. Traditionally, myofibroblasts are identified by the de novo expression and assembly of alpha-smooth muscle actin stress fibers, leading to a binary classification: “activated” or “quiescent (non-activated)”. More recently, however, myofibroblast activation has been considered on a continuous spectrum, but there is no established method to quantify the position of a cell on this spectrum. To this end, we developed a strategy based on microscopy imaging and machine learning methods to quantify myofibroblast activation in vitro on a continuous scale. We first measured morphological features of over 1000 individual cardiac fibroblasts and found that these features provide sufficient information to predict activation state. We next used dimensionality reduction techniques and self-supervised machine learning to create a continuous scale of activation based on features extracted from microscopy images. Lastly, we compared our findings for mechanically activated cardiac fibroblasts to a distribution of cell phenotypes generated from transcriptomic data using single-cell RNA sequencing. Altogether, these results demonstrate a continuous spectrum of myofibroblast activation and provide an imaging-based strategy to quantify the position of a cell on that spectrum.

2. Cell/Nucleus Perimeter: Approximated, by drawing a line through the center of each border pixel in the mask 3. Cell/Nucleus Major/Minor Axis Length: the major or minor axis of an ellipse fit to the mask 4. Cell/Nucleus Circularity: The ratio of mask area, to the area of a circle with the same perimeter as the mask. This gives a measure of how close to a perfect circle the cell shape is. A perfect circle has a circularity of 1 while a line has a circularity of 0 5. Cell/Nucleus Eccentricity: The eccentricity of an ellipse fit to the mask. The eccentricity is the ratio of the focal distance (distance between focal points) over the major axis length.
6. Cell/Nucleus Extent: The ratio of the area of a cell divided by the size of the smallest possible bounding box, or rectangle that can fully encompass the cell. Cell extent is a measure of how spread out a cell is 7. Nuc / Cyt ratio: The ratio of the area of the nucleus to the area of the cell.
8. Minkowski-Bouligand Dimension: A method of estimating the fractal dimension of the cell mask. This provides a measure of how rough the cell boundary is.
9. Mean Pearson's R: The Pearson's correlation coefficient between the a-SMA (green) and F-Actin (red) channels of the image. Pearson's R must be calculated over multiple pixels, therefore the image was first broken down into tiles of various sizes (4, 8, 16, 32, and 64 pixels). Pearson's R was then calculated for each tile, and all the tiles in an image averaged. A value of 0 corresponds to random / no correlation, while a value of 1 corresponds to perfect correlation. Figure S1. Colocalization of F-actin and alpha-SMA was quantified using the Pearson's correlation coefficient (RP). In Python, the image was first broken into a series of small tiles (Tile sizes of 50 x 50, 25 x 25, and 10 x 10 pixels shown). RP was then calculated for each tile (yellow high colocalization, purplelow colocalization). All cell containing tiles were then averaged to calculate an average colocalization for each cell. On average, activated cells had a significantly higher degree of colocalization across all tile sizes.    Figure S4. Illustration of the decision tree model and the "if" statements used in its code. Figure S5. To further compare our classification models to traditional measures of activation (a-SMA stress fiber formation), heatmaps were generated with individual cells colored by their Pearson's R coefficient. Figure S6. A traditional measure of myofibroblast activation is the association of a-SMA into stress fibers. In order to demonstrate that our manually engineered model agrees with this traditional classification method, another UMAP and PCA plot were constructed, containing the Pearson's R coefficient for the whole cell, in addition to the 4 properties listed in Figure 2A. This UMAP reduction shows a similar spectrum of activation, this time along the UMAP 1 axis. Specific axes have no meaning in UMAP reductions, so this trend matches that in figure 2C.  Figure S8. As a proof of concept, cardiac fibroblasts were cultured in TGFB+ media (complete DMEM, 10% FBS, 1% PS, 10 ng/mL TGFB), fixed, stained, and pushed through the same analysis pipelines (Figures 2A and 3A), to assign them labels in the same continuous systems reported in Figures 2 and 3. More information on this analysis can be found on the Github page.
In total 24 cells, grown in TGFB+ media, were analyzed. The control group for this study, 500 cells, imaged across 3 separate cultures, from the original dataset ( Figure 1A). Only the first 3 separate cultures were used, because they captured the true ratio of activated to non-activated cells. A) Both the MEM and the SSM capture an increase in average percent activation with the addition of TGFB, although the SSM model shows only a slight increase. B) The binary or traditional labeling system shows a significant increase in activation Figure S9. UMAP reduction of scRNA-seq analysis of 4,062 cells. Each individual cell is colored by its normalized expression levels of ACTA_2, the gene responsible for a-SMA production. Darker colors represent higher expression levels. Although less differentially expressed, this heatmap shows similar trends with other myofibroblast associated genes, increasing in the negative UMAP 1 direction. Table S3. List of genes significantly up and down regulated along PC 1 and PC 2 axes of the scRNA-seq analysis. Upregulated genes become more highly expressed in the positive direction of PC 1 and PC 2, while the opposite is true for downregulated genes. These genes contribute the most to the variance seen between all measured cells.